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ABSTRACT 

We report the discovery of a highly significant concentration of galaxies at a redshift 
of {z) = 3.090. The structure is evident in a redshift histogram of photometrically 
selected "Lyman break" objects in a 9' by 18' field in which we have obtained 78 
spectroscopic redshifts in the range 2.0 < z < 3.4. The dimensions of the structure 
projected on the plane of the sky are at least 11' by 8', or 14:h^Q by lOh^Q Mpc 
(comoving; Qm = !)■ The concentration contains 15 galaxies and one faint {TZ = 21.7) 
QSO. We consider the structure in the context of a number of cosmological models 
and argue that Lyman-break galaxies must be very biased tracers of mass, with an 
effective bias on mass scale M ~ IO^'^^Mq ranging from 6 ~ 2 for JIa/ = 0.2 to 6 ^ 6 
for Qa/ = 1- In a Cold Dark Matter scenario the large bias values suggest that 
individual Lyman-break galaxies are associated with dark halos of mass M ~ 10^^ Mq, 
reinforcing the interpretation of these objects as the progenitors of massive galaxies at 
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the present epoch. Prehminary results of spectroscopy in additional fields suggest that 
such large structures are common at z ~ 3, with about one similar structure per survey 
field. The implied space density is consistent with the possibility that we are observing 
moderately rich clusters of galaxies in their early non-linear evolution. Finally, the 
spectrum of one of the QSOs discovered in our survey {zem = 3.356) exhibits metal line 
absorption systems within the 3 redshift bins having the largest number of galaxies in 
field, z = 2.93, 3.09, and 3.28. These results are the first from an ongoing "targeted" 
redshift survey designed to explore the nature and distribution of star-forming galaxies 
in the redshift range 2.7 ^ z ^ 3.4. 

Subject headings: galaxies: evolution-galaxies: formation-galaxies: distances and 
redshifts-large scale structure of the universe 

1. INTRODUCTION 

The large-scale distribution of galaxies at early epochs provides a powerful means of 
discriminating amongst various cosmological world models and mechanisms for the formation of 
structure (see, e.g.. White 1996 and references therein). The most comprehensive surveys of the 
large-scale distribution of galaxies have been carried out in the relatively "local" universe (e.g., 
Shectman et al. 1996), while hints of what is happening at larger redshifts {z ^ 0.8) have been 
obtained using pencil-beam apparent-magnitude limited redshift surveys (e.g., Broadhurst et 
al. 1990, Carlberg et al. 1997, Le Fevre et al. 1996, Connolly et al. 1996, Cohen et al. 1996a,b). A 
general result seems to be that the (small scale) correlation function of the more distant galaxies 
is reduced significantly in amplitude relative to the present time (e.g., Efstathiou 1995, Brainerd 
et al. 1995, Le Fevre et al. 1996, etc.), while on larger scales the one-dimensional, "line-of-sight" 
structures appear to be quite prominent to at least z ^ 1, with a typical comoving distance 
between structures of ~ lOO/i^^ Mpc along the line of sight. It is not yet clear how these two 
observational results should be combined to form a coherent picture, since it is so difficult to 
obtain information in the "transverse" direction in surveys of faint galaxies, so that the nature 
of the structures giving rise to the "spikes" in one-dimensional redshift surveys is ambiguous 
(Kaiser &; Peacock 1991). Locally, at least, it appears that the line of sight structures seen in the 
pencil-beam surveys arc likely to be related to the cell-like "wall-void" geometry seen in more 
extensive, large solid-angle redshift surveys (e.g., de Lapparent et al. 1986, Landy et al. 1996). 

In addition to the difficulties presented by the "geometry" of very deep redshift surveys, it 
is not clear to what extent galaxies are reliable tracers of mass, particularly at high redshifts 
where a large degree of "bias" is expected for objects forming within massive dark matter halos 
(e.g., Bardeen et al. 1986, Brainerd & Villumsen 1992, Mo & Fukugita 1996, Baugh et al. 1997). 
However, a measurement of the bias of a class of objects at high redshift (and the evolution of the 
bias with redshift) can be used to constrain the connection between the sites of their formation 
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and the overall mass distribution, which can in turn be used to constrain models of structure and 
galaxy formation (e.g., Cole & Kaiser 1989, Mo & White 1996). 

In this paper, we present the first results of a large survey of the galaxy distribution at 
z 3. Efficient photometric selection of star-forming objects at high redshift (Steidel, Pettini, &i 
Hamilton 1995; Stcidcl et al. 1996 a,b) and subsequent spectroscopy on the W.M. Keck telescopes 
allows an assessment of the growth of structure in the galaxy distribution at significantly higher 
redshift than has been previously accessible. In a separate paper (Giavalisco et al. 1997), we 
analyze the angular correlation function of the photometrically selected z ~ 3 galaxy candidates 
in a number of fields. Here we focus on a single field in which we have obtained the most complete 
spectroscopic observations to date, with data analogous to "pencil beam" surveys at smaller 
redshifts. Aside from probing the structures traced by galaxies at significantly earlier epochs than 
has been possible previously, working at z ~ 3 also has the advantage that a reasonable angular 
scale for multi-object spectroscopy on large telescopes maps onto relatively large co-moving scales; 
our field size of 9' by 18' at z ~ 3 traces (transverse) structure equivalent to a field 24' by 48' at 
z ~ 0.5. As we will discuss below, the relatively large transverse co-moving scale provides a 
distinct advantage in assessing the galaxy clustering properties on scales that remain in the linear 
regime to the present day, allowing for a relatively straightforward analytic treatment. 

In this paper we use the the first observations of relatively large scale structure traced 
by star forming galaxies at z ~ 3 to estimate the amount by which this population must be 
biased relative to the overall mass distribution expected for standard cosmological models. The 
measurement of this bias will allow us to infer a corresponding dark matter halo mass scale, and 
makes possible future direct comparisons with gravitational N-body and semi-analytic simulations 
of early structure formation. We will show that the implied halo mass scale of the "Lyman break 
galaxies" supports the conclusion that they likely represent the progenitors of the massive galaxies 
of the present epoch (Steidel et al. 1996a, Giavalisco, Steidel, & Machetto 1996, Mo & Pukugita 
1996, Baugh et al. 1997). 

2. OBSERVATIONS 

The present work focuses on one of the fields in a survey for z ~ 3 galaxies based on 
ground-based images in the Un, G, and TZ photometric system (Steidel & Hamilton 1993), which 
is designed to be sensitive to the Lyman break in objects having redshifts primarily in the range 
2.7 < z < 3.4. A complete description of the observations and techniques used in this survey will 
be published elsewhere (Stcidcl et al. 1997). The field under discussion consists of two adjacent 
pointings of the Palomar 5m Hale telescope with the COSMIC prime focus camera. The camera 
uses a thinned, AR-coated Tektronix 2048 x 2048 CCD with a scale of 0'.'283 per pixel. The 
images were obtained in 1995 August and 1996 August, and typically reach 1 a surface brightness 
limits of 29 (AB) mag arcsec"^ in each passband. The northern field is centered on the "SSA22" 
deep redshift survey field of Cowie et al. (1996), at a = 22'*17"^34.0^ 6 = +00° 15' 01" (J2000), 
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and includes most of a nearby region observed as part of the Canada-Prance redshift survey (Lilly 
et al. 1995). The southern field is centered 525" south of that position. The size of the full 
contiguous field is 17^64 in the N-S direction and 8'74 in the E-W direction. We refer to the two 
separate pointings "SSA22a" and "SSA22b" for the northern and southern fields, respectively. 

Spectroscopic observations of "Lyman break" candidates were obtained using the Keck I 
telescope in 1995 October and 1996 August, and using the Keck II telescope in its first week of 
scientific observations during 1996 October. We obtained spectra of as many candidates as possible 
by using 7 different slit masks with the Low Resolution Imaging Spectrograph (Oke et al. 1995), 
each covering roughly 4' by 7' regions and including ~ 15-20 candidate objects per mask. Typical 
total exposure times were 1.5-3 hours per mask, in separate 1800s integrations. The resolution 
of the spectra, using a 300 line mm~^ grating, was approximately 12 A, and the grating tilt was 
adjusted so that the wavelength coverage for each slit included the 4300-7000 A range. Objects 
were assigned slits based on a number of factors, the most important being the maximization 
of the number of z ~ 3 candidates on each mask. Objects were prioritized on the basis of how 
unambiguously the Lyman discontinuity could be discerned from the broad-band photometry. 
About two-thirds of the objects discussed here satisfied our "robust" color criteria (sec Steidel et 
al. 1995), TZ < 25.5, {G-TZ) < 1.2, and (Un-G) > (G-7^) + 1.5, while the rest were selected using 
our "marginal" criteria, 7^ < 25.5, (G - 7^) < 1.2, and (G - 7^) 1.0 < {Un -G) <{G-1Z) + 1.5. 
In practice, most of the "marginal" candidates that have been observed spectroscopically also 
have Un — G > 1.6 in order to maximize the number of galaxies in the desired redshift range, 
2.7 < z < 3.4. A total of 181 objects satisfy these adopted color criteria in the survey region, of 
which 113 satisfy the "robust" color criteria. Details of the selection criteria and the resulting 
redshift selection function will be presented elsewhere (Steidel et al. 1997; Dickinson et al. 1997). 

Approximately 80% of the objects assigned slits resulted in redshifts, with 78 objects having 
z > 2 and 59 having 2.7 < z < 3.4. With the exception of a small amount of contamination 
by Galactic stars (approximately 5% of the objects satisfying the color criteria turn out to be 
Galactic stars, although there is essentially no contamination by stars for objects fainter than 
TZ = 24), all identified color-selected objects are at high redshift {z > 2). Some of the spectra 
which remain unidentified may be objects at somewhat lower redshift than our primary selection 
window, since then the strongest spectral features would fall in regions of much lower instrumental 
sensitivity; however, there is no obvious trend with either Un — G color or apparent TZ magnitude 
for the unsuccessful redshifts. A higher success rate was achieved during 1996 October due to 
much improved photometry and (therefore) photometric selection, and also higher quality slit 
masks. Nine of the objects with confirmed redshifts z > 2 (most having 2.0 < z < 2.5) were 
subsequently shown not to satisfy either of the above selection categories on the basis of the 
improved photometry; in addition, two z > 2 objects were found serendipitously on slits designed 
for color-selected objects. For consistency, in the present paper we confine ourselves to only the 
67 z > 2 objects satisfying the adopted photometric selection criteria given above, for which an 
accurate redshift selection function has been determined from the results of our spectroscopy in a 
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number of different fields. 

The redsliifts of the galaxies in the sample were measured from strong absorption features 
in the rest-frame far-UV, mostly of interstellar origin (see Steidel et al. 1996a, Lowenthal et 
al. 1997) and, when present, the Lyman a emission line. For spectra in which both absorption and 
emission are present, the redshifts defined by Lyman a emission are systematically higher than 
the absorption line redshifts by {zi^ya — ^abs) = 0.008 zt 0.004, or ~ 600 ± 300 km s'^ Based on 
limited near-IR spectroscopy of some of the high redshift galaxies (Pettini et al. 1997) it appears 
that neither the emission lines nor the absorption lines actually coincide with the true systemic 
redshift of the galaxy; the "true" redshift probably lies in between Zahs and zi,ya- This uncertainty 
probably dominates over any measurement uncertainties, and so we adopt the standard deviation 
of the velocity difference as the typical uncertainty of the measured redshifts, ~ 300 km s^^. 
Because most of the spectra exhibit Lyman a emission at some level, whereas a smaller number 
of spectra exhibit only absorption features, we have adopted the position of Lyman a emission as 
the primary redshift indicator. 

A histogram of the z > 2 galaxies (and the two faint QSOs which were discovered using the 
same photometric technique) in the SSA22a+b field sample is given in Figure 1, with a bin size 
of Az = 0.04, or Av ~ 3000 km s~^at z ^ 3. The distribution of the objects on the plane of the 
sky for both the spectroscopically confirmed and photometric candidate z ~ 3 samples is shown in 
Figure 2. 

The most striking feature in the redshift histogram is the "spike" of 15 objects at z ~ 3.1. In 
fact, one of the two z > 2 objects whose spectrum was obtained serendipitously also falls within 
the same redshift bin. The spectra of all 16 objects in this narrow redshift range are plotted in 
Figure 3, and the relative positions and photometry for the same objects summarized in Table 
1. We will not include the serendipitous object SSA22a SI in the analysis that follows, but it is 
intriguing that an object with strong Lyman a emission (but very weak continuum) was found 
despite the very small effective solid angle covered by the mask slitlets. 

3. IMPLICATIONS OF THE GALAXY OVERDENSITY 

While spectroscopy in several fields suggests that our UnGTZ selection criteria find galaxies 
over the broad range 2.7 ^ z ^ 3.4 (Stcidcl et al. 1997), in this field, the one we have sampled most 
densely, nearly one quarter of the redshifts lie between z = 3.074 and z = 3.108. A concentration 
this strong is unlikely to arise from Poisson sampling of our selection function — using the statistical 
technique described in the appendix we find that only 1 in 400 data sets generated by randomly 
drawing redshifts from our selection function (Figure 1) contain as significant a departure from 
Poisson expectations — and so this group of galaxies almost certainly indicates a true peak in 
the density field at z ~ 3.1. From Figure 1 the approximate galaxy overdensity at this redshift 
is (^gai ~ 3.5. At first glance this seems surprisingly large: in the local universe the rms mass 
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fluctuation in spheres of radius 8/i5~oo Mpc is {{6M/M)'^y/'^ = dg ~ 0.6, and since (for = 1) 
each bin in the histogram corresponds to a comoving volume of ~ 8 x 15 x Mpc^ we expect 

the relative mass fluctuation among bins cTmass to be approximately crs/{l + z) ~ 0.15. Yet the bin 
at z ~ 3.1 contains a galaxy overdensity Jgai that is 25 times larger than the expected cxmass- 

In 

this section we will see what we can learn from the existence of this single large galaxy overdensity. 
The main conclusion will be the obvious one: Lyman-break galaxies must be very biased tracers 
of mass. A more complete analysis of the galaxy distribution at z ~ 3 will be presented elsewhere. 

Our approach will be straight-forward. We will calculate a mass associated with the spike, 
treat the spike as a peak in the density field smoothed on this mass scale, and then calculate 
the probability of observing a peak so high within our survey volume for three representative 
cosmologies (J^a/ = 1, Om = 0.2 open, and Qm = 0.3 flat). The mass scale and peak height both 
depend on the bias parameter b = (^gai/^massj and we will see how large the bias must be in order 
to make it reasonably probable that we would have observed the spike in our survey volume. 

The first step is to calculate the mass associated with the spike. For concreteness we will 
consider only the 15 objects between z = 3.074 and z = 3.108 to be associated with the "spike" 
structure. This is the interval that maximizes the significance of the spike according to the 
statistical technique in the appendix, and within it there is no evidence for substructure. A 
two-sample two-dimensional Kolmogorov-Smirnov test (Press et al. (1994)) shows at the 95% level 
that galaxies in the spike have a different areal distribution from spectroscopically observed galaxies 
outside the spike, so the apparent "edge" of the spike in the southern half of the field (Figure 2) is 
probably real. There is not much evidence though that we have observed an edge to the spike in 
any other direction, and thus we can only set a lower limit to its transverse size of ~ 9' x 11(5. 
For VLm = 1, this is corresponds to ~ H/itq^ by ^^K^q comoving Mpc^. The effective depth of the 
structure for the same cosmology is ~ l^hj^ /C comoving Mpc, where C = Vapparont/^ruo takes 
into account the redshift space distortions caused by peculiar velocities. The mass associated with 
the structure is therefore M = pV{l + 6m) = 4.0 x 10^''(1 + 6m) I C H^^Mq^ where p is the mean 
density of the universe and 6m^ the mass overdensity, is related to the observed galaxy overdensity 
5gai,obs through 1 + b6m = C(l + 5gai,obs)- Following the prescription in the appendix we estimate 
a galaxy overdensity in this region of (5gai,obs = 3.6;1^|;2- 

It remains to estimate C. In principle we do not even know if C is greater or less than 1: a 
collapsing object will appear more dense in redshift space than in real space, while an object that 
has already collapsed and virialized will appear less dense in redshift space. We will see below 



^We note that this mass scale is 1-2 orders of magnitude larger than the minimum mass scale that would be 
derived for typical "spikes" in lower-redshift pencil-beam surveys such as those in Cohen et al. 1996a,b. The main 
difference is that working with a relatively large field at z ~ 3 provides a much larger co-moving field of view and so 
we are able to set a correspondingly larger lower limit to the size of structures we observe. Given our much sparser 
sampling density, we are probably sensitive only to structures on relatively large angular scales; because the relevant 
mass scales are (potentially) so different, the relationship between this structure and those seen in other pencil-beam 
surveys is not at all clear. 
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though that it is difficult to produce even a moderately non-linear overdensity on this mass scale 
at z ~ 3.1, and so we will make the conservative assumption that wc are observing an object 
which is just breaking away from the Hubble expansion rather than an object which has already 
collapsed. In this case we can use the Zeldovich approximation to correct for peculiar velocities. In 
the Zeldovich approximation each mass element maintains its linear-theory velocity, and as a result 
the density of a fluid element evolves according to pjp = = [(1 — D\\)(\. — D\-2){\ — DAs)]"^ 

where the are initial eigenvalues of the tensor of deformation and -D(t) is the linear-growth 
factor. High peaks in the density field are roughly isotropic (Bardeen ei al. 1986), and so we 
can reasonably use a simple expression for the redshift-distortion factor C {e.g. Padmanabhan 
(1993 § 8)) that holds when we view a collapsing object along a principal axis: 

l-i^(l+/)A3 

where / = din D/dlna ~ U'^Jj'^ (z) (Lahav et al. 1991). For Ai = A2 = A3 this becomes 
C = 1 + / - /(I + 5m)^/^ which is the expression we use. 

After we have estimated the peak height i' (defined below), we can return to this peculiar- 
velocity correction and gauge how wrong our assumption of isotropy is likely to have been. In 
the high-peak limit the eigenvalues of the deformation tensor have approximate relative sizes 
Al : A2 : A3 ~ 1 -I- 1.5/1/ : 1 : 1 — 1.5/u (Bond 1988). We will see below that u ranges from 2-5 for 
the parameter values we consider (Table 2), and so the assumption of isotropy is poor in some 
cases. Fortunately the peculiar-velocity correction is smallest when the assumption is worst, and in 
the end our conclusions would not be affected much by the expected level of anisotropy. We make 
a crude attempt to quantify this by randomly rotating a deformation tensor with eigenvalues in 
the "typical" ratio above and calculating the exact peculiar velocity correction for each rotation. 
Even in the worst case {u ~ 2) the la spread in C is only about 10% across the randomly rotated 
ensemble. This is small compared to the Poisson uncertainty in (5gai,obs ^^nd will be neglected. 

We can now calculate Mgpike and Sm for any assumed bias. Selected values are shown in Table 
2. The final step is to calculate whether the density field smoothed on mass scale Mgpike would 
plausibly contain a peak of height Sm in our survey volume. This will let us assess whether an 
assumed bias value is consistent with the existence of the spike. Because rms fluctuations on the 
(large) mass scale Mgpike are much smaller than unity, the smoothed density field should be well 
approximated by linearly evolved initial conditions, and we should be able to use linear theory to 
calculate the probability of observing a peak of height Sm- The one complication is that nonlinear 
effects may have begun to accelerate the peak's growth; in order to analyze the spike with linear 
theory we will need to estimate the linear theory height Sl that corresponds to the measured 
peak height Sm- Spherically-symmetric collapse is one of the few cases where nonlinear growth is 
quantitatively understood, and it would be convenient if we could use spherical collapse results to 
correct for nonlinear growth in our spike. But how accurately can we model the growth of a peak 
through spherical collapse? One issue is whether the peak is likely to have been roughly spherical 
in the first place, and we saw above this is strongly affected by its height v = Sl/ct where a is 
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the rms relative mass fluctuation on the mass scale of the spike. A second issue is whether the 
previous collapse of smaller perturbations within the peak could generate non-radial motions large 
enough to slow down — or prevent — the peak's collapse (e.g. Peebles 1990). This depends on the 
spectral index of density fluctuations n, where = Ak"^, because (for fixed A) n controls the 
level of small-scale power. Bernardeau (1994) argues that ioT u ^ 2 and n < — 1 typical peaks will 
grow at roughly the rate predicted by the spherical collapse model, at least until Sm — 4, and 
this conclusion is supported by the A^-body experiment of Thomas & Couchman (1992). Since 
in all scenarios considered the spike overdensity satisfies i' ^ 2, and since CDM spectra on the 
mass scales of interest to us have effective spectral indices n^fi ^ — 1, we will adopt the spherical 
collapse 5m 5l transformation as approximated by Bernardeau: 5l = 1.5[1 — (1 + Sm)~'^^^]- 
This should at least give us a lower limit on Sl, since previous collapse on smaller scales (which 
we have neglected) slows down non-linear growth. 

Table 2 shows two estimates of the number density of regions with mass overdensity > Sl 
for each cosmology and bias value. The first estimate A^Qauss is the number density npk{> Sl) of 
peaks of height > Sl in the Gaussian-smoothed density field, from Bardeen et al. (1986, hereafter 
BBKS). For the second estimate we use the traditional 
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where ctsth is the rms relative mass fluctuation in the density field smoothed by a spherical 
top-hat on mass scale Mgpike- Both A^Causs and A^sth have been normalized to units of 1 over 
our survey volume, which we will take to be an 18' by 9' rectangular field between z = 2.7 and 
z = 3.4. If 1 in this table, then we would expect to sec on average one peak with linear 

height > 5l in a volume this size, and the corresponding bias value and cosmology are roughly 
consistent with the existence of the spike. ATQauss (= npk) is also presented in graphical form in 
Figure 4, to give an idea of the uncertainties involved. ATq^uss and A^^sth were calculated assuming 
a Harrison- Zeldovich (n = 1) primordial spectrum modified by the BBKS adiabatic CDM transfer 
function with q = k/Fh (T is a spectral shape parameter which we will discuss more below) and 
normalized so that 

"^^° = ^'^°°^p^^ = m ,(^^M(o),^A(o)) 

where g{^M,^A) is the linear growth suppression factor from Carroll et al. (1992) and as, the 
z = cluster normalization, is from Eke et al. (1996). We adopt cluster normalization because 
it is on the same length scale as our structure; COBE measurements are on a considerably 
larger scale, and uncertainties in the spectral tilt n make COBE measurements less constraining. 
Since the normalization we have adopted is determined on roughly the same scale as the spike, 
our conclusions will be insensitive to the shape of the power spectrum. In making Table 2 and 
Figure 4 wc used Sugiyama's (1995) CDM shape parameter T ~ rijvf^ioo cxp(— $7^ ^ ^b/^m) 
with hioo = 0.7 and = 0.024//ifoo (Tytler et al. 1996), but our conclusions about b would not 
change significantly for any value of F in the range 0.1 < F < 0.7. 
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It is clear from Table 2 and Figure 4 that an overdensity similar to the one we observe would 

not occur in any of these cosmologies unless Ly man-break galaxies were very biased tracers of 
mass. The minimum bias values for f^jy/ = 1) = 0.2 open, and ^Im = 0.3 flat are approximately 
6, 2, and 4 respectively. These high values have interesting implications. In the biasing model of 
Mo & White (1996), the mass of a dark halo M can be (implicitly) estimated from its bias b and 
redshift z through 

a{M,z) 



^{b - 1)4 + 1 

where cr{M, z) is the rms relative mass fluctuation on scale M at redshift z, and 5c ~ 1.7. Because 
C7 is a decreasing function of M, heavier halos are more highly biased. From Table 2, the large 
bias values required to explain the spike imply Lyman-break halo masses of a few times IO^^Mq 
in each cosmology considered. The exact halo mass that corresponds to a given bias is spectrum 
dependent, and can change by a factor of a few for choices of T different from the /tyo Sugiyama 
(1995) value we have used. 

In summary, then, we started out with a galaxy overdensity that naively seemed ~ 25 times 
larger than the expected o"mass- We argued that part of this factor of 25 could be accounted for 
through peculiar velocities, part through nonlinear effects, and part through the suppression of 
linear growth in cosmologies with Q.m < 1- But we were still left with a galaxy overdensity several 
times larger than the expected a^ass, and we concluded that Lyman break galaxies are very biased 
tracers of mass. A large bias is not unexpected — provided these galaxies are massive systems. The 
existence of the spike lends further support to a halo mass scale of ~ IO^^Mq for these galaxies. 



4. CORRELATION OF PEAKS WITH QSO ABSORPTION FEATURES 

Two of the objects in the spectroscopic sample are high redshift QSOs. One of them, called 
SSA22a D13, has 

^cm — 3.083 {TZ — 21.7) and thus is part of the prominent redshift "spike" at 
{z) = 3.090, discussed at length in §3. A second QSO, called SSA22a D14, has ^cm = 3.356 and 
TZ = 20.8, and even at the coarse resolution (~ 12A) of our survey spectra several metal line 
absorption systems are evident (see Figure 5). There is a probable damped Lyman alpha system 
at Zabs = 2.937, an optically thick Lyman limit system at Zabs = 3.288, and another metal line 
system having a relatively strong Lyman a absorption line and also the C IV AA1548, 1550 doublet 
at Zabs = 3.094. We note that these coincide with the 3 strongest peaks in the redshift histogram 
(although, as discussed above, only the one at z = 3.09 is formally of high statistical significance). 
We also note that there is no confirmed Lyman break galaxy or photometrically selected candidate 
that is near enough to the sightline of SSA22a D14 to be (plausibly) responsible for any of these 
absorption systems, adding to the growing number of high redshift absorption systems that are 
fainter than our typical limits for the detection of Lyman break galaxies, TZ = 25.5 (see Steidel et 
al. 1995). It would clearly be of great interest to obtain a higher resolution spectrum of SSA22a 
D14 so that the nature of the absorption line systems could be better discerned. 
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There are two additional metal line systems with relatively strong C IV doublets but weaker 
Lyman a lines at ^;,bs = 2.740 and 2.801. There are no obvious features at these redshifts in the 
galaxy redshift histogram, but these redshifts are near the tail of our selection function and so the 
volume is not well-sampled using our current color criteria. 

The fortuitous placement of two reasonably bright QSOs within the SSA22 field is obviously 
of great interest for a comparison of the strTictTire seen in emission versus that seen in absorption 
along the same line of sight. Although the significance is not completely clear, the fact that all 
three of the most prominent features in the redshift histogram have counterparts in metal line 
absorption systems is particularly intriguing (cf. Francis et al. 1996) and suggestive of large-scale 
coherence (with large covering fraction) in the overall distribution of gas and stars at high redshift. 
While suggestions of large scale structure in absorbing gas from correlations of QSO absorption 
line systems (both transverse to and along the line of sight) have been many (e.g. Jakobsen et 
al. 1986; Sargent & Steidel 1987; Steidel & Sargent 1992; Dinshaw and Impey 1996; Williger 
et al. 1996, Quashnock et al. 1996), it is now possible to compile large samples of high redshift 
galaxies at the same redshifts, and there is clearly a great deal of very fruitful work to be done in 
combining the absorption and emission techniques to obtain as complete a picture as possible of 
the distribution of baryons at these early epochs. 

5. SUMMARY 

We have presented evidence for a large structure of galaxies at z ~ 3.1 on the basis of 
a pronounced "overdensity" in a redshift histogram of photometrically selected field galaxies, 
coupled with the distribution of the galaxies within this redshift-space "spike" on the plane of the 
sky. A relatively simple analysis of this structure (§3) shows that these early star-forming galaxies 
must be very biased tracers of mass, with a higher bias (b ^ 6) required in standard Qm = 1 CDM 
than inQpi = 0.2 open CDM (6 > 2) or in = 0.3 ACDM (6 > 4). A large bias is expected 
for massive galaxies, and a major conclusion of this paper is that these Lyman-break galaxies are 
indeed massive systems (M ~ IO^^Mq) as we originally claimed on different grounds (Steidel et 
al. 1996). A similarly large mass for these galaxies was also predicted by Baugh et al. (1997) on 
the basis of simple assumptions about star and dark-halo formation, and it is encouraging that 
independent estimates of these galaxies' masses should agree so well. 

An interesting sidelight is that the volume of one of our survey fields (~ few x 10^ Mpc^) is 
well-matched to the present-day density of X-ray selected clusters with kT ^ 2.5 keV (~ 3 x 10~^h^Q 
Mpc^'' (Eke et al. 1996)). In an average field of this volume we would expect to see 0.3, 1.0, 1.1 
structure that is destined to become a cluster by the present day for = ^, = 0.2 open, 

= 0.3 flat, and this raises the possibility that structure we have found at z ~ 3.1 may be an 
Abell cluster in its infancy. It is on the right mass scale, and (depending on the bias) its linear 
overdensity Sl when evolved to the present day could reach the spherical top-hat threshold of 
Sl — 1.7 foT collapse and virialization. Given the sampling density of the photometrically selected 
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candidates in the SSA22a+b field, we have probably only found ~ 30 — 50% of the objects in the 
{z) = 3.09 structure to the same apparent magnitude level. If each Lyman-break galaxy is the 
progenitor of a present-day galaxy brighter than L*, this would imply ~ 30 — 50 such galaxies 
within the structure, lending further support to the notion that the "spike" is a proto-cluster. 
In many respects the structure we have identified is very similar to a less evolved version of the 
structure found at z = 0.985 by Le Fevre et al. (1994) as part of the CFRS redshift survey. 

The results we have presented above are based on a single field of a "targeted" redshift survey 
using Lyman break photometric selection criteria and focusing on the redshift range 2.7 ^ z ^ 3.4. 
A larger sample will test the validity of the conclusions outlined here and will give a full picture 
of the correlation function of star-forming galaxies at high redshift on scales from kpc to tens of 
Mpc. Whether or not one can reach significant cosmological conclusions on the basis of only about 
70 redshifts in one field, we regard it as extremely promising that one can now feasibly observe 
the large-scale distribution of galaxies at very early epochs. We would also like to emphasize 
how efficiently one can address these issues by using photometric methods to "pre-select" the 
volume/epoch studied. Lyman break galaxies at z ~ 3 are merely one example. 

We would like to thank the staff of both the Palomar and the Keck Observatories, as well 
as the entire team of people, led by J.B. Oke and J. Cohen, responsible for the Low Resolution 
Imaging Spectrograph, for making these observations possible. We would also like to thank A. 
Phillips for allowing us to use his slitmask alignment software. We benefited from several useful 
conversations with T. Padmanabhan and J. Peacock. S.D.M. White's detailed comments on an 
earlier draft improved the clarity of §3. We are grateful to the referee, D. Koo, for constructive 
comments. CCS acknowledges support from the U.S. National Science Foundation through grant 
AST 94-57446, and from the Alfred P. Sloan Foundation. MG has been supported through grant 
HF-01071.01-94A from the Space Telescope Science Institute, which is operated by the Association 
of Universities for Research in Astronomy, Inc. under NASA contract NAS 5-26555. 

A. STATISTICAL SIGNIFICANCE OF FEATURES IN THE REDSHIFT 

DISTRIBUTION 

We use a simple method to look for clustering in the galaxy redshifts. Rather than place the 
redshifts into bins and look for unusually crowded bins {e.g. Cohen et al. 1996a), we scan our 
unbinned data for galaxy pairs, triplets, quartets, and so on whose redshifts are closer together 
than we would expect from Poisson statistics. If the product of our selection function and the 
density of Lyman-break galaxies were equal to a known constant A over the redshift range of 
interest, then without clustering the probability that a group of galaxies would span a redshift 
interval Az would be given by the the Erlangian distribution (Eadie et al. 1971) 

p{Az I NX) = A(A Az)^-2 exp(-A Az) /{N - 2)! 
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In this case it would be easy to find significantly clustered groups of galaxies; we could simply 

consider each group of N neighboring galaxies in turn, and calculate the probability that in the 
absence of clustering they would be contained in a redshift interval smaller than the observed Azq: 



p{Az < Azo) = / p{Az I NX)d{Az) 
Jo 



lo ' ' ' ' T{N-1) 

with 7, the (unnormalized) incomplete gamma function, given (e.g.) in Press et al. (1994). If this 
probability were very close to zero, it would suggest that the small separation of the N neighbors 
was inconsistent with Poisson sampling of a uniform background, and we would consider those N 
galaxies a cluster candidate. (In this section we use "cluster" to denote an arbitrarily small — or 
large — group of galaxies whose proximity is unexpected from Poisson statistics.) 

In practice the product A of our selection function and the mean density of Lyman-break 
galaxies is neither precisely known nor constant with redshift, and so we proceed as follows. 
First, we estimate the shape of \[z) by placing the redshifts of all our spectroscopically-confirmed 
marginal and robust Lyman-break galaxies (there are 208, of which 69 are in SSA22a or SSA22b) 
into bins of width Az = 0.2, and fitting the resulting histogram with a cubic spline (see Fig 1). 
We then transform our redshifts z into a coordinate system t in which A is roughly constant, and 
the problem is reduced to searching for significantly clustered groups of galaxies in the midst of a 
constant Poisson background of unknown intensity A. 

This is just the "on source / off source" problem familiar from high-energy astrophysics, 
where we observe A^i counts during a time interval ti in one part of the sky, and N2 during t2 in 
another, and must decide whether there is any evidence for an intrinsically higher count rate in 
region 2. In our case we want to decide whether there is evidence for the presence of a cluster 
{i.e. an elevated count rate) in the redshift interval t2 between two arbitrarily selected galaxies. 
We begin by estimating the "background count rate" A from the density of galaxies outside this 
candidate cluster. If there arc Ni galaxies at redshifts less than or equal to the lowcst-redshift 
cluster member and A^3 galaxies at redshifts greater than or equal to the highest-redshift cluster 
member, Bayes' Theorem gives 

where ti is the interval between the lowest-redshift galaxy in our sample and the lowest-redshift 
cluster member, and is the interval between the highest-redshift cluster member and the 
highest-redshift galaxy in our sample. With this probabilistic distribution for A, we can proceed 
analogously to the case above where A was known exactly. Since the probability that the group of 
A^2 galaxies would be contained in exactly the observed interval t2 is 

p{t2\NxN2N^hh) = j dXp{t2\XNiN2Nstit3)p{\\NiN2N3tit3), 
the probability that they would be contained in an interval smaller than the observed t2 is 

p{t < t2) = 4(iVi + 7V3 - i,iV2 - 1) = c 
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where x = (ti + t3)/(ii + ^2 + ^3)1 Ix is the incomplete Beta function (Press et al. 1994), and we 
have substituted the Erlangian distribution for p{t \ NX), assumed a uniform prior for A over the 



detailed derivations of similar results.) This expression for lets us assess the degree to which 
the redshifts of any A^2 adjacent galaxies are inconsistent with Poisson sampling of a uniform 
background; groups of galaxies with ( very close to (or to 1) are not expected in the absence 
of clustering, and so by calculating ( for all groups of adjacent galaxies in our sample, with 
= 2,3, 4, . . ., we can locate the most statistically significant clusters (or voids) in our redshift 
distribution. In practice we restrict our attention to clusters (or voids) with Az < 0.2, since larger 
features are interpreted with this technique as part of the selection function rather than of the 
galaxy distribution. 

After using this procedure to locate candidate clusters in our redshift distribution, we estimate 
the significance of each cluster by comparing its C to the distribution of C in simulated data sets 
with redshifts randomly drawn from the smooth selection function in Fig 1. If a candidate cluster 
has a C so small that it appears in only 1 of 100 random data sets, we assign that cluster a 
significance of 99%. 

This technique has the obvious flaws that the estimate of the background level is not correct 
when there is more than one strong spike in the data, and that we take no account of correlations 
between galaxies in calculating significances. Neither is critical in the present application, because 
we have only one dominant spike, and it is on a scale many times larger than the Lyman-break 
galaxy correlation length (Giavalisco et al. 1997). 

Once we have identified a significant cluster we may want to associate an overdensity 6 with 
it though the relationship 6 = nduster/'fi — 1; where nduster is the number density of galaxies within 
the cluster and n is the background number density, n will in general be well determined by the 
many galaxies outside the cluster, but nduster is more problematic. We are defining the edge of 
the cluster to fall exactly on the lowest- and highest-redshift members, so should we include those 
two galaxies in estimating n-duster 

? In S3 we exclude from the ndug^ei. calculation any galaxy whose 
position is explicitly used in defining the cluster boundary. There are three such galaxies — one 
sets the low-redshift edge, another the high-redshift edge, and the third sets the southern edge of 
the cluster (Fig 2) — and to estimate estimate ^duster we use the formula 



interval [0, Xmax] and taken the limit A, 



max 



00 (see Jaynes (1990) or Loredo (1992) for more 



(nV) 



nV 



p{n I NV) = V 



(iV-3)! 



with N = 15. 
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Tablc 1. Objects Within the (z) = 3.09 Structure 



Object 


RA'' 


Dec^- 


n 


G-n 


Un-G 


Redshift'' 


SFR« 


SSA22a C3 


3.1 


9.2 


24.53 


0.48 


>2.48 


3.074 


12.3 


SSA22b C32 


1.0 


6.8 


24.89 


0.50 


>2.32 


3.074 


8.7 


SSA22a C23 


7.6 


15.3 


25.09 


0.55 


>2.17 


3.075 


7.3 


SSA22a C14 


2.1 


13.5 


25.07 


0.79 


>2.33 


3.081 


7.4 


SSA22a D13 


3.7 


14.5 


21.59 


0.35 


2.28 


3.083 


QSO 


SSA22b D27 


7.3 


6.2 


25.13 


0.29 


2.18 


3.086 


7.1 


SSA22a D2 


4.8 


9.7 


23.39 


0.95 


2.61 


3.086 


34.6 


SSA22a C30 


8.2 


16.4 


24.67 


0.53 


>2.36 


3.090 


10.7 


SSA22a MD31 


6.0 


16.3 


23..30 


0.44 


1.84 


3.090 


38.0 


SSA22a MD20 


5.9 


12.6 


24.15 


0.50 


1.67 


3.091 


17.3 


SSA22a M9 


4.9 


15.1 


24.75 


0.83 


>2.06 


3.094 


10.0 


SSA22a 05 


2.6 


9.6 


23.64 


0.78 


>2.93 


3.099 


27.5 


SSA22a Sid 


1.4 


8.9 


26.5: 






3.100 




SSA22a 019 


7.8 


15.0 


24.07 


0.99 


>2.53 


3.103 


18.6 


SSA22a 08 


6.4 


10.7 


24.17 


0.47 


>2.96 


3.108 


17.0 


SSA22a 09 


4.0 


10.9 


24.41 


0.50 


>2.82 


3.108 


13.7 



''In arc minutes, referring to the coordinate system shown in Figure 2 
''Defined by the position of the Lyman a emission line 

'^Star formation rate estimated from the UV continuum, assuming zero reddening, 
flM = 0.2, Qa = 0, Ho = 70 1cm s-^Mpc-^, and a Salpeter IMF evaluated from 
0.1 Mq to 100 Mq 

''Object found serendipitously-extremely faint in the continuum 
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Table 2. Summary of Statistics for Different Cosmological Models 







S^M = 1 






Um = 0.2 




Qm = 


0.3, nA 


= 0.7 




6 = 4 


6 = 6 


6 = 8 


6=1 


6= 1.5 


6 = 2 


6 = 2 


6 = 3 


6 = 4 




0.9 


3 


6 


0.04 


0.6 


2 


0.1 


0.8 


2 




8.2 


6.9 


6.2 


14 


11 


8.9 


14 


11 


9.8 


'-'mass 


0.68 


0.49 


0.38 


2.2 


1.6 


1.3 


1.1 


0.86 


0.68 


'-'mass,L 


0.44 


0.35 


0.29 


0.80 


0.71 


0.64 


0.60 


0.51 


0.44 


^Gauss^ 


4.9 


3.7 


3.0 


3.2 


2.7 


2.3 


4.3 


3.4 


2.8 




3.6 


2.7 


2.1 


2.6 


2.2 


1.9 


3.3 


2.6 


2.2 


A^pk^ 


10-4 


0.03 


0.20 


0.02 


0.09 


0.22 


0.001 


0.03 


0.14 




0.003 


0.07 


0.33 


0.03 


0.14 


0.35 


0.006 


0.07 


0.23 



^Mass of the dark halos that harbor individual Lyman-break galaxies, derived from the bias 
assuming the n = 1, hm = 1 CDM spectrum discussed in the text. Units of 10^^ Mq. 

''Mass scale of "spike" structure, in units of 10^^ Mq. 

■^True mass overdensity associated with spike. 

'^Linear mass overdensity associated with spike. 

''<^mass,L/c'"Gauss, whcrc o"Gauss is thc rms relative mass fluctuation in the density field 
smoothed by a Gaussian on mass scale M. 

^<Jmass,L/o"sTH, where (TSTH is the rms relative mass fluctuation in the density field smoothed 

by a spherical top-hat on mass scale M. 

^Expected number of peaks of height 5mass,L in Gaussian-smoothed density field, per 
surveyed volume. From BBKS. 

^Alternate estimate of expected peak number density. 




Fig. 1. — The redshift histogram of the 67 color-selected objects with z > 2 that have been 
confirmed spectroscopically in a 8^7 by 17' 6 strip. The dotted curve represents the smoothed 
redshift selection function obtained to date for our overall survey, normalized so as to have the 
same number of total galaxies as the SSA22 sample. Approximately one-third of the confirmed 
redshifts are from the SSA22 fields. The binning here is arbitrary; the formal boundaries of any 
"features" in the redshift distribution used for analysis were obtained using the method described 
in §3 and in the appendix. 
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SSA22A+B, With Redshifts 
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SSA22A+B, All Candidates 
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Fig. 2. — Left panel: Distribution on the plane of the sky of all Lyman break objects with redshifts 
z > 2 The 16 objects with (z) = 3.090 ± 0.02 are circled (the object with a circle but no "dot" is 
SSA22a SI, which was found scrcndipitously); the two QSOs are indicated by stars. Right panel: 
Same as left panel, but here the distribution of all color selected (using the same criteria as for the 
spectroscopic sample on the left panel) candidate z ~ 3 galaxies on the plane of the sky is shown. 
Again, objects known to be part of the structure at z = 3.09 are circled. The dotted region in both 
panels shows the approximate location and size of the SSA22 Hawaii Deep Survey field (Cowie et 
al. 1996). 
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Fig. 3. — Spectra of the 16 objects in the redshift interval 3.07 < z < 3.11. The positions of some 
of the features often seen in the spectra of high-redshift star-forming galaxies are indicated; not all 
of these features are evident in every spectrum. The absolute flux scale is only approximate; the 
photometry for each object is given in Table 1. The spectra have been smoothed by a kernel of 
width I2A (the spectral resolution) for display. 
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Fig. 3. — (continued) 
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Fig. 4. — Reconciling the observed galaxy overdensity with various cosmological scenarios. The 
smooth curves show the expected number of peaks in this field with linear overdensity greater 
than Sl- The shaded box shows the approximate peak height and number density deduced from 
our observations. The actual box position is slightly different for each of the cosmologies, but this 
difference is insignificant for our purposes. If a curve passes through the box then the corresponding 
parameter values arc at least roughly consistent with the existence of the spike in the rcdshift 
histogram. The x range of the box is a 68% confidence interval that takes into account only the 
(Poisson) uncertainties in Sgai; the y range of the box 0.3 < Upk < 2.5 is also an approximately 
68% interval on the number density of similar structures at z ~ 3, based on the fact that we have 
detected one such overdensity in the first densely-sampled field. The two smooth curves for each 
cosmology give an idea of the uncertainty in n^fc. The upper curve applies if the true mass of 
the structure is la lower than our best estimate and the normalization ag is la higher than Eke 
et a/.'s best estimate. The lower curve applies if the mass of the structure is la higher and the 
normalization la lower than the best estimates. See text. 
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Fig. 5. — The spectrum of QSO SSA22a D14, with the metal hne absorption systems corresponding 
to the rcdshifts of the three most significant features in the SSA22a+b redshift histogram marked. 

(see text) 



